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Abstract: This paper introduces a new method to estimate the spectral distribu- 
tion of a population covariance matrix from high-dimensional data. The method is 
founded on a meaningful generalization of the seminal Marcenko-Pastur equation, 
originally defined in the complex plan, to the real line. Beyond its easy implemen- 
tation and the established asymptotic consistency, the new estimator outperforms 
two existing estimators from the literature in almost all the situations tested in a 
simulation experiment. An application to the analysis of the correlation matrix of 
S&P stocks data is also given. 
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1 Introduction 

Let xi, . . . ,x„ be a sequence of i.i.d. zero-mean random vectors in or C, with a common 
population covariance matrix Ep. When the population size p is not negligible with respect to 
the sample size n, modern random matrix theory indicates that the sample covariance matrix 



does not approach Ep. For instance, in a simple case where Ep = Jp (identity matrix) , the 
eigenvalues of S„ will spread over an interval approximately equal to (1 =F sjpjrif' around the 
unique population eigenvalue 1 of Ep (Marccnko and Pastur (1967), Yin et al. (1988) and Bai 
and Yin (1993)). Therefore, classical statistical procedures based on an approximation of Ep 
by Sn become inconsistent in such high dimensional data situations. 
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To be precise, let us recall that the spectral distribution (SD) of an m x m Hermitian 
matrix (or real symmetric) A is the measure generated by its eigenvalues {A^}, 




where Sb denotes the Dirac point measure at 6. Let ((7i)i<i<p be the p eigenvalues of the 
population covariance matrix Ep. We are particularly interested in the following SD 




Following the random matrix theory, both sizes p and n will grow to infinity. It is then natural 
to assume that Hp weakly converges to a limiting distribution H when p ^ oo. We refer this 
limiting SD H as the population spectral distribution (PSD) of the observation model. 

The main observation is that under reasonable assumptions, when both dimensions p and 
n become large at a proportional rate say c, almost surely, the (random) SD G'^" of the sample 
covariance matrix S„ will weakly converge to a deterministic distribution F, called limiting 
spectral distribution (LSD). Naturally this LSD F depends on the PSD H, but in general this 
relationship is complex and has no explicit form. The only exception is the case where all the 
population eigenvalues (ai) are unit, i.e. Ep = Ip [H = 5{); the LSD F is then explicit known 
to be the Marcenko-Pastur distribution with an explicit density function. For a general PSD 
H, this relationship is expressed via an implicit equation, see Section 3, Eqs. (1) and (3). 

An important question here is the recovering of the PSD H (or Hp) from the sample 
covariance matrix Sn- This question has a central importance in several popular statistical 
methodologies like Principal Component Analysis (Johnstone (2001)), Kalman filtering or In- 
dependent Component Analysis which all rely on an efficient estimation of some population 
covariance matrices. 

Recently, El Karoui (2008) has proposed a variational and nonparametric approach to this 
problem based on an appropriate distance function using the Marcenko-Pastur equation (1) 
below and a large dictionary made with base density functions and Dirac point masses. The 
proposed estimator is proved consistent in a nonparametric estimation sense assuming both the 
dictionary size and the number of observations n tend to infinity. However, no result on the 
convergence rate of the estimator, e.g. a central limit theorem, is given. 

In another important work Rao et al. (2008), the authors propose to use a suitable set of 
empirical moments, say the first q moments: for k — 1, . . . , g, — p~^ tr = p~^ X]f=i ^'i 
where (A;) are the eigenvalues of Sn (assuming p < n). Here a pure parametric approach is 
adopted and the PSD depends on a set of real parameters 9: H — H{6). Therefore, when 
n — >■ CO and under appropriate normalization, the sample moments {&k) will have a Gaussian 
limiting distribution with asymptotic mean and variance {me, Qo} which are functions of 
the (unknown) parameters 0. In Rao et al. (2008), the authors propose an estimator On of 
the parameters by maximizing the asymptotic Gaussian likelihood of a = {&j)i<j<q, with 
distribution NqimgjQe). Intensive simulations illustrate the consistency and the asymptotic 
normality of this estimator. However, their simulation experiments are limited to simplest 
situations and no theoretic result are provided concerning the consistency of the estimator. An 
important difficulty in this approach is that the functions mg and Qe have no explicit form. 

In a recent work Bai ct al. (2010), a modification of the procedure in Rao et al. (2008) 
is proposed to get a direct moments estimator based on the sample moments (cij). Compared 
to El Karoui (2008) and Rao et al. (2008), this moment estimator is simpler and much easier 
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to implement. Moreover, the convergence rate of this estimator (asymptotic normahty) is also 
established. A recent paper by the authors in Chen et al. (2010) has also analyzed the underlying 
order selection problem and proposed a solution based on the cross-validation principle. 

However, despite all the above contributions, there is still a need for new methods of 
estimation. Actually, the general approach in El Karoui (2008) has several implementation 
issues that seem to be responsible for its relatively low performance as attested by the very 
simple nature of provided simulation results. This low efficiency is probably due to the use of a 
too general dictionary made with large number of discrete distributions and piece-wisely linear 
densities. Concerning the moment based methods in Rao et al. (2008) and Bai et al. (2010), we 
will see that their accuracy degrades drastically as the number of parameters to be estimated 
increases. Lastly, it is well known that the contour-integral based method in a related work 
Mestre (2008) is limited to a small class of discrete models where distinct population eigenvalues 
should generate non-overlapping clusters of sample eigenvalues. 

The new approach developed in this paper can be viewed as a synthesis of the optimization 
approach in El Karoui (2008) and the parametric setup in Bai et al. (2010). On one hand, we 
adopt the optimization approach and will prove that it is in general preferable to the moment 
approaches. On the other hand, using a generic parametric approach for discrete PSDs as 
well as continuous PSDs, we are able to avoid the aforementioned implementation difficulties 
in El Karoui (2008). Another important contribution from the paper is that the optimization 
problem has been moved from the complex plan to the real line by considering a characteristic 
equation (Marcenko-Pastur equation) on the real line. The obtained optimization procedure is 
then much simpler than the original one in El Karoui (2008). 

The rest of the paper is organised as follows. In the next section, we provide a Marcenko- 
Pastur equation defined on the real line which will be the corner-stone of our estimation method. 
This method is developed in Section 3 and we prove its strong consistency. Then, in Section 4, 
simulation experiments are carried out to compare the performance of three estimation methods 
under investigation. The last section collects proofs of main theorems. 

2 Marcenko-Pastur equation on the real line 

Throughout the paper, A^^^ stands for any Hermitian square root of a non-negative definite 
Hermitian matrix A. Our model assumptions are as follows. 

Assumption (a). The sample and population sizes n,p both tend to infinity, and in such a 
way that p/n — >■ c G (0,oo). 

Assumption (b). There is a doubly infinite array of i.i.d. complex- valued random variables 
(wij), i,j > 1 satisfying 

E{wii)=0, E(|ii;ii|^) = 1, 

such that for each p,n, letting Wn ~ (wij)i<i<p,i<j<n, the observation vectors can be repre- 
sented as Xj — Sy^w.j where w.j — {wij)i<i<p denotes the j-th column of W„. 

Assumption (c). The SD Hp of Ep weakly converges to a probability distribution // as n — > oo. 

The assumptions (a)-(c) are classical conditions for the celebrated Marcenko-Pastur theo- 
rem (Marcenko and Pastur (1967); Silverstein (1995), see also Bai and Silverstein (2010)). More 
precisely, under these Assumptions, almost surely, as n — ^ oo, the empirical SD F„ ~ G"^" of 
Sn, weakly converges to a (nonrandom) generalized Marcenko-Pastur distribution F. 

Unfortunately, except the simplest case where H = S\, the LSD F has no explicit form 
and it is characterized as follows. Let s{z) denote the Stieltjes transform of cF + (1 — c)(5o , 
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which is a one-to-one map defined on the upper half complex plan = {z £ C : Q{z) > 0}. 
This transform satisfies the following fundamental Marcenko-Pastur equation (MP): 

z = -^+c f—^dHit) , zeC+. (1) 
s[z) J 1 + ts{z) 

The above MP equation excludes the real line from its domain of definition. As the first 
contribution of the paper, we fill this gap by an extension of the MP equation to the real line. 
The estimation method introduced in Section 3 will be entirely based on this extension. 

The support of a distribution G is denoted by Sq and its complementary set by Sq, since 
the ESD Fn is observed, we will use s^, the Stieltjes transform of {p/n)Fn -I- (1 — p/n)5o to 
approximate s in the MP equation. More precisely, let for it G R, 

, s 1 ~ p/n 1 1 

SnW = — + -J2-y ■ 2 

u n Ai — u 

It is clear that the domain of s„(u) is Sp^. Thus, s^(ti)'s are well defined on U for all large n, 
where U is the interior oi U — liminf„_>cx) Sp^ \ {0}. 

Theorem 2.1. Assume that the assumptions (a)-(h)-(c) hold. Then 

(1) for any u £ U, s^iu) converges to s{u), 

(2) for any u G Sp, s = s(m) is a solution to equation 

1 f t 



(3) the solution is also unique m the set — {s(z K\{0} : du/ds > 0, (— s) ""^ G Sjj}, 

(4) for any non-empty open interval [a, b) C B'^ , H is uniquely determined by u{s), s G (a, b). 
The proof is given in the last section. Some remarks are in order. 

1. Notice that since (— cxd,0) C U C Sp, there are infinitely many u-points such that s^{u) 
almost surely converges to s(u). 

2. The MP equation (3) can be inverted in the following sense: the knowledge of u{s) on 
any interval in B^ (see Figure 1) will uniquely determine the PSD H. The estimation 
method in Section 3 will be built on this property. 



3 Estimation 
3.1 The method 

We consider the estimation problem in a parametric setup. Suppose H = H{6) is the limit of 
Hp with unknown parameter vector 6 £ Q GM.'' . The procedure of the estimation of H includes 
three steps: 

SI. Choose a u-net {iti, . . . ,Um} from U, where itj's are distinct and the size m is no less 
than q. 
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Figure 1: The curve oi u — u{s) (solid thin), and the sets and Sp (sohd thick) for 
= 0.3(52 + 0.4(57 + O.3(5io and c 0.1. u, ==-it(sj),s, e B+,i = 1,2,3,4. 



S2. For each Uj, calculate s^{uj) using (2) and plug the pair into the MP equation (3). Then, 
we obtain m approximate equations 

1 p r tdH{t,e) 
'^^ ~ s„(itj) nj l + ts„(uj) 

~ ^*j(Snj.^) 0' = 1,•••,"^)• 
S3. Find the least squares solution of 9, 




We name 9n as the least squares estimate (LSE) of 6. Accordingly, H — H{9n) is called 
the LSE of H. A central issue here is the choice of the u-net . . . , Um}- In Section 4, we will 
provide a robust method for this choice that can be used in practice with real data. 

This procedure can also be applied to the MP equation (1) in complex field as in El Karoui 
(2008). Similarly to our first two steps, the author chose a z-net from and created a system 
of approximate equations by a discretisation H as a weighted sum of a grid of pre-chosen 
mass points. The estimates of the weight parameters were then obtained by minimizing the 
approximation errors in terms of the Loo norm. The author also suggested to use a z-net with 
K(z) < and "^{z) near 0. This is almost equivalent to choosing a u-net with ii < in our 
procedure. But we strongly suggest to use more u-points from U n R"*" if possible, since these 
points are likely to carry some different information about H comparing with negative u-points. 
For the optimization step, whatever the distance used (L2-norm, Loo-norm, etc.) our method 
would be easier and faster than El Karoui's one since the optimization is carried on the real 
domain. 

3.2 Consistency 

We establish the strong consistency of our estimator in two models that are widely used in the 
literature. The estimates will be further studied in the simulation section. 
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The first model is made with discrete PSDs with finite support on R^, i.e. 

H{e) = mi5ai H [-nikSak, 6 € 0, 

where nik = 1 — X]f=i nii, 6 — (ai, . . . , a^, mi, . . . , irik-i) are {2k — 1) miknown parameters and 



i = l 



e = <^ 6* G R^* ^ : mi > 0, ^ mi = 1; < ai < • ■ • < afc < +oo 
Here, Equation (3) can be simplified to 

1 



k 

airrii 

u . ^ - 



. - + aiS 
i—i ~ 

For the well-definition of the equation on O, we assume that the it-net satisfies 

inf min |1 -I- ais(uj)\ > 5, (4) 
fe© i.j 

where 5 is some positive constant. It is clearly satisfied if all the uj's are negative. 

Theorem 3.1. In addition to the assumptions (a)-(b)-(c) , suppose that the true value of the 
parameter 6o is an inner point of O and the condition (4) is fulfilled. Then, the LSE 9„ for the 
discrete model is strongly consistent, that is, almost surely, 6n ^ 9o- 

Next we suppose that the PSD H{6) has a probability density h{t\9) with respect to 
Lebesgue measure. From Szego (1959) (Chapters 2, 4), if h(t\9) has finite moments of all order, 
it can be expanded in terms of Laguerre polynomials: 



Mi|e)=;^c,V,(t)e-', 

where 

c^^jMmmdt. 

As discussed in Bai et al. (2010), we consider a family of h{t\6) with finite expansion 

9 9 

H-tW ^^CjiPj{t)e'* = ^ajt^e~\ t > 0, 9 € 0, 

3=0 j=0 

where ao = 1 — ai — • • • — ^io^g, 9 — (ai, . . . , ctq), and 

e = {61 G R« : h{t\e) >0, t€ R+}. 

For this model. Equation (.3) becomes 

1 ^ f t^+ig-* , 
u = l-c> a-j at. 

It's clear that the calculation of 9n is here simple since the above equation is linear with respect 

to e. 

Theorem 3.2. In addition to the assumptions (a)-(b)-(c) , suppose that the true value of the 

parameter 6o is an inner point of B. Then, the LSE 9n for the continuous model is strongly 
consistent. 
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Table 1: Wasserstein distances of estimates for H = O.S^i + 0.5(52. 

p/n — 0.2 p/n = 1 p/n = 2 



LSE Mean 0.0437 0.0601 0.0893 
S.D. 0.0573 0.0735 0.1077 



RMSE Mean 0.0491 0.0689 0.0859 
S.D. 0.0320 0.0482 0.0629 



BCY Mean 0.0500 0.0664 0.0871 
S.D. 0.0331 0.0466 0.0617 



4 Simulation experiments 



In this section, simulations are carried out to compare our LSE witii the approximate quasi- 
likelihood estimate in Rao et al. (2008) (referred as RMSE) and the moment estimate in Bai et 
al. (2010) (referred as BCY). We do not include the estimator of El Karoui (2008) in this study 
since this estimator is nonparametric using a suitable approximation dictionary while the LSE 
is based on a parametric form of unknown PSDs. 

We study five different PSDs: three of them are discrete and two continuous. Samples 
are drawn from mean-zero real normal population with the dimensions n — 500 and p — 
100, 500, 1000. Statistics are computed from 1000 independent replications. 

To evaluate the quality of an estimate H = H{9), instead of looking at individual values (Oi) 
of the parameters, we use a global distance, namely the Wasserstein distance W = J IQnit) — 
Qg{t)\dt where Q^it) is the quantile function of distribution /i. The use of Wasserstein distance 
is motivated by the fact that it applies to both discrete and continuous distributions (unlike 
other common distance like kuUback-leibler or L2 distance). 

For the LSE, we need to choose a w-net from n Sp \ {0}. When H has finite support, 
the upper and lower bounds of Sf \ {0} can be estimated respectively by \max ~ max{Ai} and 
Amin = min{Ai : Ai > 0} where Ai's are sample eigenvalues. As a consequence, we design a 
primary set: 



Next, we choose I equally spaced it-points from each individual interval of U. We name this 
process as adaptive choice of it-net. Here we set I = 20 for all cases considered in simulation, 
that is, for example we take { — 10 -I- lOt/21, t = 1, . . . , 20} from the first interval. 

Case 1: H = 0.5(5i + 0.5(52- This is a simple case as H has only two atoms with equal 
weights. Table 1 shows that all the three estimates are consistent, and their efficiency is very 
close. 

Case 2: H = 0.35i -I- O.4J3 -|- O.SSs. In this case, we increase the order of H. Analogous 
statistics are summarized in Table 2. The results show that LSE clearly outperforms RMSE and 
BCY in the light of the Wasserstein distance. Particularly, RMSE and BCY have not converged 
yet with dimensions n — 500 and p — 500, 1000, while LSE only contains a small bias in such 
situations. This exhibits the robustness of our method with respect to the increase of the order. 

Case S: H — O.SSi + OASs + 0.3(5i5. In this case, we increase the variance of H. Table 3 
collects the simulation results. Compared with Table 2, RMSE and BCY deteriorate significantly 




'max , 



lOAmai) (discrete model, p ^ n), 
(discrete model, p = n), 
(continuous model). 
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Table 2: Wasserstein distances of estimates for H = 0.35i + QA S^ + O.365. 

p/n — 0.2 p/n = 1 p/n — 2 



LSE 


Mean 


0.1589 


0.3566 


0.4645 




S.D. 


0.1836 


0.4044 


0.5156 


RMSE 


Mean 


0.2893 


0.7494 


0.8153 




S.D. 


0.0966 


0.2188 


0.1080 


BCY 


Mean 


0.2824 


0.5840 


0.7217 




S.D. 


0.1769 


0.2494 


0.2156 



Table 3: Wasserstein distances of estimates for H = 0.3Si + OAS5 + O.S^is. 







p/n = 0.2 


p/n = 1 


p/n = 2 


LSE 


Mean 


0.1756 


0.2524 


0.5369 




S.D. 


0.2105 


0.3013 


0.6282 


RMSE 


Mean 


0.7090 


1.4020 


1.9160 




S.D. 


0.0524 


0.6501 


0.2973 


BCY 


Mean 


0.9926 


1.5379 


1.8562 




S.D. 


0.5618 


0.6875 


0.7526 



while LSE remains stable. The average Wasserstein distances of LSE are (at least) a third less 
than those of RMSE and BCY for all p and n used. This demonstrates the robustness of our 
method with respect to the increase of the variance. 

Case 4: h(t) = (qq + ait)e~',ai = 1. This is the simplest continuous model with only 
one parameter to be estimated. In this case, _ff is a gamma distribution with shape parameter 
2 and scale parameter 1. Statistics in Table 4 show that all the three estimates have similar 
efficiency. 

Case 5: h{t) = (ao + ait + Q2t^ + a3t^)e~', ai = 02 = as = 1/9. This model with three 
parameters becomes more difflcult to estimate. RMSE and BCY have large bias and/or large 
standard deviations in all dimensions we used, see Table 5. In contrast, our LSE performs fairly 
well and again outperform these two moment based methods. 

In summary, the LSE outperforms the RMSE and BCY estimators in all the tested situa- 
tions. On the other hand, as expected, the performances of the RMSE and the BCY estimators 



Table 4: Wasserstein distances of estimates for h{t) = te 



p/n — 0.2 p/n — 1 p/n — 2 



LSE 


Mean 


0.0939 


0.0441 


0.0294 




S.D. 


0.0704 


0.0317 


0.0229 


RMSE 


Mean 


0.1126 


0.0508 


0.0346 




S.D. 


0.0839 


0.0393 


0.0262 


BCY 


Mean 


0.1168 


0.0491 


0.0348 




S.D. 


0.0881 


0.0361 


0.0268 
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Table 5: Wasserstein distances of estimates for h{t) = (t + t"^ + t^)e */9- 



p/n = 0.2 p/n = 1 p/n = 2 



LSE 


Mean 


0.1895 


0.0902 


0.0740 




S.D. 


0.1103 


0.0526 


0.0378 


RMSE 


Mean 


0.3163 


0.1515 


0.1156 




S.D. 


0.2062 


0.0863 


0.0670 


BCY 


Mean 


0.3139 


0.1554 


0.1114 




S.D. 


0.2007 


0.0907 


0.0624 



are very close since they are all based on empirical moments (however, as explained in Bai et 
al. (2010), the BCY estimator is much easier to implement). 

Finally, we analyze the relationship between the size of a u-net and the efficiency of LSE. 
The average of Wasserstein distances of LSE with respect to different I values (the number of 
u-points picked from each individual interval) is plotted for Case 3 and Case 5, see Figure 2. 
The results show that unless I is too small, the estimation efficiency remains remarkably stable 
with different values of 



Average of W-distances 



Average of W-distances 



OS - 
0.4 

03 - 



25 30 



Figure 2: The average of Wasserstein distances of LSE with respect to / = 5, 10, . . . , 30) 
for Case 3 (left) and Case 5 (right) with p = 100, n — 500 (sohd lines), p — 500, n = 500 
(dashed hnes), and p = 1000, n — 500 (dotted lines). 



5 Application to S&P 500 stocks data 

In this section, we present a financial application of our estimation procedure in analysing an 
empirical correlation matrix of stock returns. We study a set of 488 U.S. stocks included in the 
S&P 500 index from September, 2007 to September 2011 (1001 trading days, 12 stocks have 
been removed because of missing values). Here, the data dimension is p = 488 and the number 
of observations is n = 1000. 

Following Bouchaud and Potters (2009), we suppose that there is a PSD H{a) for the 
stock returns with an inverse cubic density h{t\a): 

h(t\a) = ^ I{t > a), 0<a<l, 
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where c = 2(l — q)^ and a — 2a — 1. Notice that when q — >■ 1^, the inverse cubic model tends 
to the MP case {H — Si), so that this prior model is very flexible. 

For the estimation procedure, we first remove the 6 largest sample eigenvalues which are 
deemed as spikes over the bulk of sample eigenvalues. As in Section 3, we use I = 20 equally 
spaced u-points in (-10,0). The LSE of a turns out to be a = 0.4380. The RMSE and BCY 
don't exist for this model for the reason that the moments of H don't depend on the unknown 
parameter. 

Limiting spectral densities corresponding to the LSE estimate /i(t|0.4380) and H = Si are 
shown in Figure 3. We also plot the empirical spectral density of the correlation matrix, and 
the curve is smoothed by using a Gaussian kernel estimate with bandwidth h = 0.05. 

1.4 
1.2 
1.0 
0.8 
0.6 
0.4 
0.2 



1 2 3 4 5 

Figure 3: The empirical density of the sample eigenvalues (plain black line), compared 
to the MP density (dashed line) and the limiting spectral density corresponding to the 
LSE estimate /i(t|0.4380) (dashed-dotted line). 

From Figure 3, we could see that the MP density is far away from the empirical density 
curve. This confirms a widely believed fact that the correlation matrix may have more structure 
than just several spikes on top of the identity matrix. By contrast, the cubic model with 
a = 0.4380 yields a much more satisfying fit to the empirical density curve. 

6 Proofs 

We first recall useful results in three lemmas. The first one is provided in Silverstein (1995) and 
the two others in Silverstein and Choi (1995). 

Lemma 6.1. Assume that the assumptions (a)-(h)-(c) hold. Then, almost surely, the empir- 
ical spectral distributton F„ converges m distribution, as n ^ oo, to a non-random probability 
measure F, whose Stieltjes transform s = s{z) is a solution to the equation 

s = f — ^ dH(t). 

J t(l - c - czs) - z ^ ' 

The solution is also unique m the set {s £ C : —(1 — c)/2 + cs G C"*"}. 
Lemma 6.2. If u £ Sp \ {0}, then s = s{u) satisfies 

(1) s G R \ {0}, (2) (-s)-' G S^H, (3) du/ds > 0. 

Conversely, if s satisfies (l)-(3), then u = u[s) G Sp \ {0}. 
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Lemma 6.3. Set B = {sG I^\{0} : s) ^ G Sh}. Let [sj^jSj], [13, S4] be two disjoint intervals 
in B satisfying for all s G [sj^jSj] U [sgjS^], du/ds > 0. Then [111,112], [113,^4] are disjoint where 
Ui = u(s-),i = 1,2,3,4. 



6.1 Proof of Theorem 2.1 

The first conclusion follows from two convergence theorem. In fact, for any fixed u £ U there are 
£o £ and no G Z+ such that U{u,eo) C n^^ngS'Fn \ i^}- This implies that for all x G Sf^ 
and n > no, we have |l/(a; — u)| < 1/eo- From this and Lebesgue's dominated convergence 
theorem, for any fixed u G (7, almost surely, 

s„iu) -> s(u), 

as n — >■ oo with p/n c > 0. By Vitali's convergence theorem (Titchmarsh, 1939, Page 168), 
we may conclude that converges almost surely for every u £ U. 

Next, we consider the second conclusion. For any fixed u G Sp and e > 0, let z = u + ei, 
from Lemma 6.1, 3(2) satisfies (1). On the other hand, according to Lemma 6.2, {—s{u))~^ G Sjj 
and thus \t/{l + ts{z))\ is bounded on the set {{t,e) : Sh x [0, 1]}. Therefore, by Lebesgue's 
dominated convergence theorem, taking the limit as e — >■ 0^ on both sides of (1) gets the 
conclusion. 

Conclusion 3 follows from the results of Lemma 6.2 and Lemma 6.3. In fact, let Ug+{s) 
be the restriction of u{s) to B^ , then Lemma 6.2 shows that the range of Ug+{s) is Sp \ {0}. 
Lemma 6.3 indicates that (s) is also an injection. Therefore, Ug+ (s) is a bijection from B"*" 
to S^\{0}. 

As to the last conclusion, suppose H\{t) and H2{t) are two population spectral distribution 
functions satisfying, for all s G (a, 6), 

* -dm{t) = f ^^dH2{t). (5) 



1 + ts ' ' J 1 + is 

We are going to show Hi = H2 almost everywhere with respect to Lebesgue measure on R. 
For any Sq G {a,b), —1/sq is an inner point of S^, then there is Jo > such that 

Ui~l/so,So/\s,\) C S^H, 

which implies |1 + ts^] > Sq for all t G Sh- Choose eo = iiiin{|sQ|5o/(l + So),b — Sq,Sq — a}. 
Then, for any s G f/ (so, eo), 1 + ts has the same sign as 1 + tSg. Define 

(1 + tSo > 0, ?i > 0), 
g{t,u) = <; -1 (l + tsa <0, u< 0), 
{u{l + tsg)u < 0). 



We have then 



t 

1+ts 



/+00 
«)e~'"'"''*+-'"d«, sG U{Sf^,eo)- 
- oc 



Therefore, each side of (5) can be expressed as 
* -dH,(t) = 



ts 



/ + OO /"OO 
/ g{t,u)e-^^+^o>dH,{t)e-^^-^«'>"du. (6) 
-00 Jo 
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It is clear that the left hand side of (6) is the Laplace transform of 

' g(t,u)e-^^+^o^''dH,{t). 
By the uniqueness of Laplace transform, we have then 

1 r°° 1 

g{t,u)e-^''dHi{t) ^ / g{t,u)e-T^dH2{t), 
Jo 

and thus Hi — H2 almost everywhere. 

6.2 Proof of Theorem 3.1 

Define 

where Sj — s{uj) {j — 1, . . . , m). We first state and prove the following proposition. 

Proposition 6.1. Ifui,..., Um o.re distinct and m > q — 2k — 1, then ^p{9) — for the discrete 
model has a unique solution 6o on Q. 



Proof. Since s{u) is a bijective function from 5*^ to B'^ and ui, . . . , Um. are distinct, si, . . . , Sm 
are also distinct. 

Suppose there is a. 6 = (ai, . . . , a^, mi, . . . , mfc_i) such that 1^2(6') = 0. Denote by 60 — 
{a'l, . . . , aj,, m'l, . . . , m'f._i) the true value of the parameter. We will show that 9 — 6o. Denote 
bi = 1/ai and b'^ = l/a'^ (i = 1, . . . , fc), we have then 



z — 1 i — l ' 

Now look Sj as a parameter s and reduction to common factors leads to 

fc k 

{s + b'O ■■■{s + b',) J2 m^ l[{s + be) = (s + fei) . . . (s + fo,.) ^ m[ JJis + b'^). 

These are polynomials of degree 2fc — 1; they coincide at m > 2fc — 1 different points s = Sj; 
they are then equal. Back to (7), we have now for all s 7^ —bi, — 6^, 

k k f 



Q 4- h, ^ 



a + bi s + b'. 

Now each bi should match one b'g, because otherwise bi 7^ b'e for all £ and by letting s — >■ —bi 
we get a contradiction. So there is one b'^ matches (then unique) for bi. This proves also that 
mi = m'l. As the bi are ordered, it is necessary that b'^ — b'i and hence also rUi — m'i. □ 

Now let's begin the proof of Theorem 3.1. Recall that 

1 p f tdH{t, 6) ^ ^ 



argmm > Mij H -, — r / 



~ arg minion (6'). 
see 
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Under the assumption of the theorem, by the convergence of s^{uj) {j — 1, . . . ,m), (p„{d) is 
well defined on O for all large n. Moreover, for any fixed ^ £ O, we have 

almost surely. Proposition 6.1 guarantees that 6* = is the unique solution to ip{0) = on O. 

We claim that for almost all oj, there is a compact set Q — Q{uj) C O which contains all 
Oniuj) for large n. It's easy to see that for all large n, tpn{0) is uniformly bounded on O and has 
continues partial derivatives with respect to 9. By the Vitali's convergence theorem, we get 



sup|v3„(6')-¥^(6»)l ^0. 
fee 



(8) 



For any £ > 0, by the continuity of ip{9), we have 



inf <^(0) > ^(0o) = 0. 

l«-»0ll>e 

see 



From this and (8), when n is large. 



inf (f„{9) > ipn(9o). 
see 



This proves that minimum point 9„ of ipn{9) for 9 £ O must be in the ball {||S — 6*0 11 < e}. 
Hence the convergence 9^ — > 9o. 

To complete the proof, it is sufficient to prove the claim, i.e. there is a compact set Q G Q 
such that for large n, 

iiif 'fi„{9) > (p„{9o). 

Suppose not. Then there exists a sequence {9i,l — 1,2,.. .} tending to the boundary dO of B 
such that lim;^oo 'fin{9i) < ipn{9o)- Under this situation, we only need to consider the following 
two cases. 

The first is that {9i} has a convergent sub-sequence, i.e. 9i^ ^ 9 £ dO, as k ^ oo, then it 
follows that 

< <p{9) = lim lim V5„(6»iJ < lim <p„{9o) = <^(6»o) = 0, 

n~>oo k — ^oc n^cG 

hence v'(^) ~ 0. By a similar technique used in the proof of Proposition 6.1, we may get 9 = 6*0, 
a contradiction. 

The second is that ||^;|| = (X]i=i '^li + X^t^i^ "t-?;)^^^ ^ ^o. Then we immediately know 
there exists an such that a^; — s> oo, as Z — >■ oo. Without loss of generality, suppose that 

aa ^ oo (1 < j < ki), 
Y^'^Li mu — 5> mo 

ttij — > ai < oo (fci + 1 < j < fc), 

mu — >■ nii (fci + l<j<fc — 1). 



We have then 



and thus 



< lim lim ip„{9i) < lim ip„{9o) = (p{9o) 

n—^oo l—^oo n — >-oo 



lim lim (^„(6li) = ( Zj 



1 — cmo 



+ C > 



i = fcl+l 



= 0. 
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If mo = then the problem is similar to the first case. Assume mo 7^ 0. Denote ( 
(a'l, . . . , aj,, m'l, . . . , m'(,_x), we have 

mo ^ ■y^ aiTTii ■yU a'im'i 
Sj _ ^ 1 + aiSj ^ 1 + a'sj ' 

for j = 1, . . . ,m. Now look Sj as a parameter s and multiplying common factors leads to 



s n (i+«n«)n(i+4.^)(^+ E itS 



'l=fcl+l !2 = 1 



These are polynomials of degree 2k — ki < 2k — 1; they coincide at m > 2fe — 1 different points 
s = Sj; they are then equal. Comparing their constant terms comes into conflict. 
The proof is then complete. 

6.3 Proof of Theorem 3.2 

The proof of this theorem is similar to the proof of Theorem 3.1. We only present the following 
proposition. 

Proposition 6.2. If ui, . . . , Um are distinct and m > q, then f{9) = for the continues model 
has a unique solution 6o on O. 

Proof. Suppose there is a ^ = (qi, . . . , a,) such that Lp{0) — 0. Denote by Oq — (a^, . . . , a'^.) the 
true value of the parameter. We will show that 6 = Oo- 

Define p{t, p) = /3o + ■ • ■ ,+Pqt\ where /3 = . . . , and /3o = l-E,Lii!/3i- 
We have then 

/ iTt^P^^' e')e-'dt = (j = 1, . . . , m), 

where 0* — 9 — 9o and Sj — s(itj). 

Suppose p{t,9*) = has qo (< q) positive real roots ti < . . . < tqg, and denote to — 
0, tqo+i ~ +00, then p(t, 9*) maintains the sign in each interval {ti^i,ti) {i — 1, . . . , go + !)■ By 
mean value theorem, we have 

90 + 1 

= / T^Pi^' 9')e-'dt = V / p{t, 9*)e-'dt (j = 1, . . . , m), 



—pit,9')e-dt^^^^l^_^pitX)e-\ 

where G {ti-i,ti) {i = I, . . . ,qo + 1). 

Now look Sj as a parameter s and reduction to common factors leads to 

90 + 1 Mi 
i=l Ijti J-l-i-l 

The left hand side is a polynomial of degree go — 1 < g— 1 (the coefficient of s'" = njlL"!^ Io° 9*)e~*dt - 
0); the equation has m > q different roots s = s^; the polynomial is then zero. Let s — {i = 

1, ... ,170 + 1), we get 

p{t,9')e'''dt = (i = l,...,po + l), 



which is followed by p{t, 9*) = Q, and thus 9* = Q. □ 
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